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Abstract 

A new thresholding strategy for the estimation of a deterministic image immersed in noise is introduced. The 
threshold is combined with a wavelet decomposition, where the wavelet coefficient of the image at any fixed value 
of the decomposition index is estimated, via thresholding the observed coefficient depending on the value of both 
the magnitude of the observed coefficient as well as the magnitudes of coefficients of a set of additional images 
calculated from the observed image. The additional set of images is chosen so that the wavelet transforms of the 
full set of images have suitable deterministic and joint stochastic properties at a fixed scale and position index. Two 
different sets of additional images are suggested. The behaviour of the threshold criterion for a purely noisy image 
is investigated and a universal threshold is determined. The properties of the threshold for some typical deterministic 
signal structures are also given. The risk of an individual coefficient is determined, and calculated explicitly when 
the universal threshold is used, and some typical deterministic signal structures. The method is implemented on 
f-H , several examples and the theoretical risk reductions substantiated. 

CO 
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I. Introduction 

■ r ■ i HIS paper treats the problem of estimating an unknown deterministic image immersed in noise. The proposed 

M estimation procedure will be based on a separable wavelet decomposition of the observed image that is 
^ [ augmented by a set of wavelet decompositions of additional images calculated from the observed image. The 

■ wavelet coefficients of the full set of images at any fixed value of the scale and position are used to estimate 
O . the wavelet transform coefficient of the deterministic image at the given scale and position. The transform is 

then inverted and the spatial domain image estimated. In 1-D signal estimation Donoho and lohnstone [1], [2] first 
proposed estimation of a noisy deterministic signal using the wavelet transform. The success of such decomposition 
based methods mainly relies on the deterministic and stochastic properties of the observed or noisy decomposition 
coefficients at any fixed index value. In the simplest form the estimation procedure roughly corresponds to separating 
G ■ 'clean' and 'noisy' coefficients into two subsets, where the 'noisy' coefficients are eliminated or subjected to some 
J> . form of shrinkage [3]. Often each coefficient is estimated separately at any given index value and for example the 
I procedure may correspond to eliminating the coefficients whose magnitudes do not exceed a given threshold. A 
j_j • possible choice of threshold is the universal threshold, constant across coefficients, that for large sample sizes gives 



O 



a risk close to that given by using an 'oracle,' i.e. knowing whether a coefficient should be eliminated or retained 
[1]. A slightly different definition is given to the universal threshold by the authors of [4], that we shall use. If 
the decomposition is highly compressed hard thresholding combined with the universal threshold will achieve very 
good estimation in terms of low mean square error. 

Naturally, to achieve optimal compression for locally simpler structures, such as 1-D behaviour embedded in a 
2-D image, whilst still being able to represent varied signal structure in 2-D, the decomposition algorithm becomes 
more complicated. If a very simple decomposition method is used, then determining the statistical properties of the 
observed coefficients is easily done, and the decomposition can be found without major computational expense. The 
draw-back is that in general the mean square error of the estimation will, unless the estimation procedure is more 
complicated, with a very simply decomposition often increase due to lack of compression. Hence a trade-off must 
be found between the choice of decomposition and the appropriate treatment of the coefficients of the observed 
signal to form estimates. This compromise will naturally vary with the assumptions placed on the observed signal. 
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Fig. 1. Row 1: estimating a section of the noisy boat image (SNR=5.56), with the noisy image (a), the usual hard thresholding estimate 
(b), the hypercomplex estimate (c) and the HMM estimate (d). The hypercomplex estimate in (c) achieves greater continuity, see for example 
the mast, than that achieved by (b). The HMM estimate has more noise left in the reconstruction, a feature also observed by Starck et al. 
[5] [p. 680]. Row 2: a section of one of the bands of the noisy Tiffany image (SNR=8), with the noisy image (a), the usual hard thresholding 
estimate (b), the hypercomplex estimate (c) and the HMM estimate (d). Observe the curved loose strand of hair, and that the HMM method 
reconstructs the background with some noisy artifacts. 

As the variational structure in 2-D is a great deal richer than in 1-D, many different methods have been developed 
to achieve optimal compression for given image structures, and for particularly successful examples see work by 
Starck et al. on curvelets [5], work by Donoho on wedgelets [6] or work by le Pennec and Mallat on bandelets 
[7]. An important feature of all these decompositions is the representation of an image in terms of coefficients 
associated with a given spatial position, and length scale. The coefficients are considered 'local' to such positions 
and scales. 

Methods that achieve a substantial degree of compression can afford to treat each decomposition coefficient 
individually and without a great deal of sophistication. To achieve better estimation of coefficients for a simplistic 
decomposition method treating decomposition coefficients simultaneously may also give improved estimation. This 
may correspond to full likelihood based or similar methods such as those proposed by Jansen and Bultheel [8], 
note also work by Johnstone and Silverman where the local sparsity of decomposition coefficients is discussed 
[9], as well as usage of specific known coefficient structure in the decomposition of a deterministic signal across 
coefficients: see for example Cai and Silverman [10], Dragotti and Vetterli [11], Pizurica et al. [12], Crouse et al. 
[13], Fryzlewicz [14] and Olhede and Walden [15]. By modelling continuity across coefficients in terms of their 
local index, estimates with a reduced mean square error may be obtained, that frequently correspond to better visual 
reconstructions of the image. If the method of estimating the decomposition coefficients is not very complicated 
but still captures continuity across the decomposition index well, then we may choose a decomposition of the 
image that is not optimal in terms of compression of the deterministic image, but that is computationally cheap 
to implement, and where the estimation of the decomposition coefficients may be treated carefully. We may then 
achieve a reduced mean square error in the estimation of the image at a low computational expense. The purpose of 
this paper will not be to develop an optimal decomposition algorithm, but instead to improve the estimation of the 
decomposition coefficients, without complicating the procedure substantially. We shall base our image estimate on 
the 2-D separable wavelet transform coefficients, extending 1-D methods of utilizing coefficient structure to 2-D. 
We shall make the developments precise by discussing the risk of a given estimated decomposition coefficient. 

In 1-D Dragotti and Vetterli [11] explicitly model signal structure as a polynomial function plus some dis- 
continuities and jointly estimate the full set of coefficients corresponding to a discontinuity. Pizurica et al. pool 
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information regarding joint structure in 2-D across coefficients via estimating the local Lipschitz constant and this 
information is used to estimate the probability that a wavelet coefficient contains contributions from a signal. Crouse 
et al. [13][p. 887] model signal presence across coefficients in 1-D in terms of clustering and persistence, i.e. if 
a coefficient is non-null at a given decomposition index, then coefficients that are "close" to this index are also 
non-null. A similar strategy is adopted in 1-D by Cai and Silverman [10], whilst Fryzlewicz [14] considers the 
magnitude of any additional arbitrary coefficient when estimating a given coefficient. Fryzlewicz established the 
risk of this strategy, determined from the mean and covariance matrix of the two coefficients. Fryzlewicz's treatment 
is very general and instructive. 

In a similar spirit to some of the aforementioned methods in 1-D Olhede and Walden [15] considered the 
thresholding of an individual wavelet coefficient based on the magnitude of the observed coefficient and the 
magnitude of the decomposition coefficient of the Hilbert Transform (HT) of the signal, denoting this method 
'analytic' denoising. As both the HT and the wavelet transform are linear the strategy can be viewed either as 
constructing a second out-of-phase replication of the original signal and finding its local decomposition, or as 
forming a weighted average of coefficients of the same scale that are nearby in time and using this magnitude to 
determine if there is local signal presence. The latter strategy is similar in spirit to block thresholding, but instead 
of using a local magnitude calculated from an average of squared adjacent wavelet coefficients at the same scale 
in the thresholding procedure, the square of a weighted average of coefficients with a weighting of O (l / (2 J A;)) is 
used. For 'analytic' thresholding to work well the wavelet coefficient of the HT of the deterministic signal must be 
large when the wavelet transform of the signal should not be estimated by zero, and the distribution of the wavelet 
coefficient of the HT of the noise needs to be jointly determined with the wavelet coefficient of the noise at the 
same scale and time. As the HT can be considered to have the same time-frequency structure as the original signal 
the wavelet transform of the signal and the HT of the signal should be large concurrently. Olhede and Walden [15] 
determined that the wavelet transform of the noise and its HT were approximately uncorrelated at a fixed time 
and scale, and supplied an appropriate universal threshold for 'analytic' denoising. Figure |3a) shows the risk of 
a given coefficient using 'analytic' denoising, based on the wavelet transform of the signal and its HT taking the 
same magnitude. The figure verifies that in the case of equivalent means the risk of an 'analytic' hard thresholded 
coefficient estimate with a universal threshold is less than that of a hard thresholded estimate with a universal 
threshold. Thus improvements to standard denoising in 1-D can be obtained by implementing this procedure. 

We seek to extend 'analytic' thresholding to 2-D, and this will in general require defining additional images, 
serving the same purpose as the HT did in 1-D. The HT was useful in 1-D, as it has the same time-frequency 
structure as the original signal, something we discuss in section IIB, and also the joint statistical properties of the 
wavelet transform of noise and its HT at a fixed scale and position was easily determined. In 2-D there are many 
possible extensions to the HT, where in each case more than a single additional component is defined. We refer to 
such components as quadrature components, that are introduced and discussed in sections IIB and IIC. There is 
more than one extension because variation in the image can either be locally uni-directional, and associated with a 
given direction, or occurring in several directions simultaneously (see Olhede and Metikas [16] for a more complete 
discussion of this topic). We investigate the usage of two possible HTs: the Riesz Transforms (RTs, section IID) of 
the image or the tensor products of the HT in 1-D with the identity filter, denoted the HyperComplex transforms 
(HCTs, section HE). We define the local magnitude of the wavelet coefficients from the quadrature components 
(section IIIA), and propose a threshold criterion to estimate the wavelet coefficients of the image. Once the wavelet 
transform is inverted this yields an estimate of the image, and this method is denoted hyperanalytic denoising. 

We discuss the properties of the local magnitude for stylized image structure: i.e. the behaviour of the threshold 
criterion for oscillatory structures and edges (section IIIB). We discuss the choice of threshold, and an appropriate 
universal threshold for correlated threshold criteria (section IIIC). We determine the approximate distribution of the 
decomposition of the Riesz and Hypercomplex components of noise alone at a fixed value of the indexing (section 
HID), and this allows us to determine universal thresholds for both the RT and HCT based methods (section HIE). 
We calculate the approximate risk associated with the two different thresholding strategies with a given threshold 
(section IIIF), and discuss the value of the risk of the different procedures for certain scenarios. 

We implement the procedure on several examples (section IV), and compare results with the Hidden Markov 
Model method (HMM). We observe that a reduced mean square error is obtained from using the proposed image 
denoising strategies, and discernable improvements in the visual reconstructions. Hyperanalytic denoising is thus 
shown to give a simple and computationally competitive method of improving existing denoising strategies. 
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II. Image Model 

A. Image Structure 

We model the observed image [Y x ] x for x = [xi,X2] T £ D, where D = [0,N — l] 2 , and Ax denotes the 
sampling period via: 

Y x = q(x 1 Ax,x 2 Ax) + e x , xED. (1) 

We collect the observed image in a matrix Y = [Y x ] xeD , and similarly define q = [q x ] x( z D = [q(xiAx, X2Ax)] x< - D , 
as well as e = [e x } xeD ■ The noise is modelled by e x ~ N (0, a 2 ) , where ~ denotes distributed as, and 
Cov(e x ,e y ) = cr 2 <5 xy , x, y G D, i.e. the noise is Gaussian, uncorrected and isotropic. A decomposition 
of the image in terms of a wavelet basis [17] is formed via 

g(xAx) = J2 <?.Ai,k(x) + £ W$ >k to(x) + £ U^ k r ( ,, k Sx) + £ ^^(x), (2) 

jM j,k jr',k k 

where ipj^^x), ^j,2,k(x), ^j,3,k(x), and ^-^(x) are the tensor products of functions ipj^x) and fy^x), re- 
spectively. = wj 9 4 k is then associated with smooth behaviour in the image g(x) in the variables x\ and 

%2, Wjl^k * s associated with smooth behaviour in x 2 and rapid variation in x\, etc, where u = 1, ... ,4 denotes 
the tensor product index, j is associated with scale 2 J , where 1 < j < Jq = lg(N), whilst k is associated 
with a spatial localisation in the plane. If an image with N 2 coefficients is observed, then for any fixed value j, 
< ki < Nj — 1, 1 = 1, 2, where Nj = N '/2>. For simplicity collect the indices in a vector- valued index of 



^ = [j, u, k] . The full set of coefficients 

The DWT is usually implemented by repeated fi 



W {q \ 



is the Discrete Wavelet Transform (DWT) of q(-). 
tering of the observed signal with two special filters, the scaling 



filter {gi : / = 0, . . . , L — 1} and the wavelet filter {hi : I = 0, . . . , L — 1} , in both spatial directions separately. We 
initialise the transform by equating the image with the finest scale representation of the image, i.e. Vq^ X2 = q Xl ,x 2 - 
The transform at index £ can also be implemented using a single filter h^%. The decomposition is halted at level 

j = J < Jo = lg(N), and the scaling coefficients |vj k j are determined at this level to complete the representation. 
Hence for j < J only [Wj-^J for u = 1, 2, 3 are calculated. For more details on the DWT, see for example Percival 
& Walden [18], whilst a good exposition of image decompositions can be found in Mallat [17]. Having observed 
Y x rather than q x we calculate the DWT coefficients and threshold these to obtain an estimate of W^ q \ 

denoted . Wavelets will compress images of sufficient regularity, a statement that can be made precise in terms 
of Besov spaces, but for some locally simple image structures, a more compressed representation can be made [5], 
[7]. Hence for images containing say edges the deterministic image energy in the DWT will be spread over more 
coefficients than strictly necessary, and as the magnitude of the affected coefficients will be less than the coefficients 
representing the same structure in a more compressed alternative decomposition it is important that the estimation 
procedure does not fail to retain signal generated coefficients. 



B. Quadrature Components 

In one version of the 1-D estimation algorithms suggested by Cai and Silverman [10], the coefficient at scale 
j and position k was estimated using a shrinkage rule depending on the combined magnitude of the observed 
coefficient at [j, k] and the magnitude of the immediate time-neighbours at the same scale, i.e. at [j, I] for / / k. 
This procedure will perform well if a signal contribution present at the [j, k] index exhibits clustering in adjacent 
coefficients, i.e. the wavelet coefficients will have large means at and the noise is uncorrected over / / k. 
The scale adjacent coefficients at a given time point have been used to improve estimation [13], [14], and Olhede 
and Walden [15] used the wavelet decomposition of the HT of the observed image to this purpose. We seek to 
generalise the method in [15] to 2-D, and discuss some of its properties, before proceeding to do so. 

To simplify the discussion of the HT, let the Fourier Transform (FT) of a d dimension signal g(x) be denoted 
by: 

Q(f)= / q(x)e 2in{T * d d x= |Q(f)|e- 2i ^ f \ 

JK. d 



STATISTICS SECTION TECHNICAL REPORT TR-06-01 



5 



this defining the magnitude (|Q(f)|) and phase (tp q (i)) of g(x) in the Fourier domain. Given a 1-D signal q(x) the 
HT in both the time and frequency domain are defined by: 



Hq(x) = -T dy, (HQ) (/) = (-i)Q (/) sgn(/), 

7T x-y 



(3) 



and the transform can be approximated suitably for discrete implementation (see [15]). Usually q(x) and TLq{x) 
are collected into a complex-valued representation, denoted the analytic signal, given by q^ + \x) = q(x) + iHq(x). 
If q(x) = cos(27r/'x) then q( + \x) = exp(2irif'x), but sometimes too much emphasis is put on this description 
of oscillatory signals, to the extent where the HT is almost discounted in usage when the observed signal is not 
oscillatory. Even if q(x) does not correspond to an oscillation, the HT can be considered to enjoy certain properties, 
such as: i) 7iq(x) is orthogonal to q(x), i.e. f Ttq*(x)q(x) dx = 0, ii) the magnitude of the HT of q(x) at any 
given frequency / ^ is identical to that of the orig inal signal, i.e. \Q(f)\ 2 = \HQ{f)\ 2 , iii) the HT is linear 
in the signal, and iv) the HT of a signal can be considered as having the same time-frequency signature as the 
original signal, i-iii) immediately follow from equation (J3jl, and ensure that the distribution of the DWT of the HT 
of white noise at a given value of [j, k] is asymptotically identical to that of the DWT of the original noise, and the 
two wavelet coefficients are approximately uncorrected [15]. The fourth property merits some further discussion. 
Clearly the notion that a signal and its HT have the same time-frequency structure is accepted in signal processing, 
as the analytic signal, rather than the real signal, is used to construct time-frequency representations of a real signal. 
For example usage of the Wigner-Ville distribution rather than the Wigner distribution is generally advocated [19]. 
As may be noted from equation © TL(q){x) has at all frequencies / ^ exactly the same frequency support as 
q(x), whilst the spatial support of q(x), has been spread out by the convolution with l/(irx). 
The HT is usually interpreted as a phase-shift of tt/2 to signal q(x). Note that we may write: 



q(x) = 2 / |Q(/)| cos(2vr(/x - <^(/))) df, Hq(x) = 2 / \Q(f)\ S in(27r(/x - <p q (f))) df. (4) 
Jo Jo 

Hence the same magnitude of |Q(/)| is assigned to each frequency /, and the contribution previously associated 
with cos(2tt (<p q (f) — fx)) is now shifted in cycle or phase by tt/2. Thus in some sense, we are recovering the 
same signal, as the frequency description is the same, but there has been a very slight shift in time alignment. 
Thus TLq{x) should have roughly the same time-frequency support as q(x). This implies that the DWT coefficient 
of q{x) should have about the same magnitude as Tiq(x), as the DWT forms a time-frequency decomposition of a 
given signal. The DWT is compact in time, and we wish to encourage using time information in nearby locations 
when estimating a given coefficient in analogue with Cai & Silverman. Of course once the coefficient has been 
estimated, the estimate of the signal will be based on the thresholded wavelet coefficients of the observed signal 
alone, and thus discontinuities can still be reconstructed. 

Given the nice deterministic and stochastic properties of 'analytic' denoising, it is not strange that we seek to 
generalise the concept to 2-D. A first step in this procedure is the definition of linear transformations of the image 
that will serve the same purpose as the HT did in 1-D. The HT and the signal formed a natural representation in 
terms of the 'analytic' signal, where the real and imaginary components were phase shifted versions of each other, 
or we may denote the latter two signals as being 'in quadrature,' i.e. as representing out-of-phase replications of 
the same structure. Their magnitude squares represent the local presence of the signal well but we stress that even 
if the signal is not oscillatory, the interpretation of the HT as having roughly the same time-frequency support still 
rests on the above arguments. We shall denote the signal and its HT as quadrature components, and will define the 
2-D generalization of this two-component signal collection. 

Definition 2.1 (Quadrature Components): We denote by Quadrature Components of q(x) any set of images 
|^(s,0( x )| ; i , where s denotes the specific transform used in the construction of the components that satisfy: 

1) each (/( s '')(x) is orthogonal ('out of phase') to the original signal q(x) = q( s '°\x), or 
J^o J^oc Q*(x)g( s '^(x) d 2 x = 0, and for all separable gs(x) = qi(xi)x2(x2), also 

2) the combined energy assigned to each frequency f from the full set of quadrature components at all points of 



2 



f G K 2 except for a finite set of frequencies satisfies the equation z~2t=i Q^'^(f) = |Q(f)| z , where 

(s) 

< CI < oo is constant and, 
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3) each g( s '^(x), for I = 1, . . . , L, is constructed by a linear transformation of g(x), 

4) the space and spatial frequency support of [^'^(x), . . . , q( s > L )] , for I = 1, . . . ,L is similar to that of g(x). 
We form the DWT of all L + 1 images, |#£ s '^| , and define: 

W^ 8 ' l) = wf ( "" ) , Z = 0,...,L. (5) 

The linear operator that constructs object q( s ' l \x) from q(x) will be denoted V*- 5 '^ and the transformation is 
implemented in the spatial domain using the kernel v (x) that once the integral is approximated using a Riemann 
sum is replaced by a linear filter Vp' 1 ^. The FT of t>( s,/ )(x) is denoted V^ s ' l \f) whilst taking the FT of v^x yields 

the object (f). The discrete implementation of the calculation of the quadrature components is outlined in 
Appendix A. 



C. Stochastic Properties of Decomposed Quadrature Components 

We establish the stochastic properties of the wavelet decomposition of noise alone, and for this purpose define 
at a fixed value of £ : 

h. (6) 



n (s,u) 



Proposition 1 (Energy of Quadrature Components ): At a fixed index value £ = [j, u, k] T the total energy of 
the DWT of the quadrature components of white noise with variance a 2 is given by: 

E (j2 n^A = c[V + O (1/N) , «= 1,2,3, 4. (7) 

Proof: See appendix C. The error term follows from the Riemann approximation to the integral. ■ 
Proposition 2 ( Covariance of Transforms of the Signal and Its Quadrature Components ): At a fixed index value 
£ = [j, u, k] T the covariance of the DWT of white noise, and the DWT of any of the quadrature components of 
the white noise if of order 0(1/N). 

Proof: See appendix C. The error term follows from the Riemann approximation to the integral. ■ 
Thus at any given value of £ the DWTs of e x and |ex'^| are approximately uncorrected, and the combined 

energy of the DWTs of je^' } is a multiplicative constant of the energy of the DWT of e x . Thus the squared 
magnitudes of these objects have a tractable joint distribution. Condition |3] ensures that we may assume that the 
mean of the DWT of the observed image will be simultaneously large to the mean of the DWTs of the quadrature 
components of the observed image at a given value of the index £. Of course whilst the general definition of 
'quadrature components' may then seem justifiable, this does not guarantee the existence of such objects. We shall 
give two different specific examples of quadrature components based on extending the HT to 2-D, and discuss their 
properties. We base the set of quadrature components on hyperanalytic functions, see [16]. 



D. The Riesz Transforms 

The Riesz Transforms (RTs) have been used in combination with the wavelet transform by Metikas and Olhede 
[16], [20]. Denote the convolution of two functions (?i(x) and 92 (x) by [q\ **g 2 )(x) = J J R2 (?i(y)q , 2(x — y) d 2 y. 
The RTs of q(-), denoted (^'^(x) and g( r ' 2 )(x), are obtained by convolving q(-) with the Riesz kernels v^ r ' l '(x), 
given in terms of x = \J x\ + x\ and / = sj ' f 2 + /| by: 

uW)(x) = 2^' yW)(f) = _i f' 1 = 1; 2 ' ^ W)(x) = ( v{r,l) * * q ) (x) ' 1 = lj 2 - (8) 

The RTs satisfy the conditions of quadrature components, see for example [16][p. 15-16]. Given the RTs combine 
to have the same norm as q(x), C% = 1. As in Olhede & Metikas [16] we argue that if unidirectional structure 
only is present in the image with orientation u, i.e. the image admits the representation for n = [cos(u) sm(u)] T 
and v G (0, ir) of: 

roo 

qu(x)= G l /(/)cos(27r/n T x)d/, (9) 
J 
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then the interpretation of the RTs is simplified. We use polar coordinates and set f = / [cos(4>) sin(</>)] . Then 

G ( f) 

the Fourier transform of <ft/( x ) is Qui?) = 2 [^(^ ~~ v ) + ^(^ + ^)] an ^ we fi n ^ : 

/>oo 

= [cos(z/) sin(i/)] / C7 {/ (/) S in(27r/n T x)4f. (10) 
J 

Thus the two quadrature components represent the same 1-D directional variation as (fty(x), with the same di- 
rectionality as <7t/(x, ) but where the variations in direction v have been shifted in phase and multiplied by a 
constant factor. Thus (informally) for unidirectional variation the Riesz transforms have the same spatial and spatial 
frequency support as the original signal. Note that we are not assuming that q(x) is periodic or oscillatory. 

E. The Hypercomplex Transforms 

A second set of 2-D HTs are the HyperComplex Transforms (HCTs), defined as tensor products of the identity 
transform and the HTs. By Olhede and Metikas [16][p. 12-13], it is shown that the hypercomplex transforms give 
a valid set of quadrature components, and note C3 = 3. Denote the partial HT [21] in direction xi by Hi. Three 
additional quadrature components are defined by: 

^(x) = m {q} (x), ^(x) = H 2 {q} (x), ^(x) = H 2 Hx {q} (x). (11) 

If the image is naturally expressed as separable in the frame of reference the three HCTs of qs(-) are by trivial 
extension of equation ©, the same signal shifted in phase in the two axes. Of course the purpose of this paper 
will be to alleviate problems (see for example Starck et al. [5] [p. 671] ) when estimating nonseparable images 
based on coefficients calculated in a separable decomposition whose energy spread across more coefficients than 
strictly necessary. Assume q(x) is non-separable then define its Partial FT (PFT) in direction x\ by: Qi{fi,x 2 ) = 
\Qi{h,x 2 )\e- 2lT ^^' x ^ = q(x)e- 2i7T f lXl dx u so 

g(x) = 2/ \Qi{h,x 2 )\cos{2Tr{f x xx- Vl {h,x 2 ))dh (12) 
J 

poo 

H iq (x) = 2/ \Qi(fi,x 2 )\cos(27r(f 1 x 1 - ( p 1 (f 1 ,x 2 )-n/2)dfi. (13) 
J 

Thus q( h,1 \x.) corresponds to replicating all variation in x%, for any fixed value of x 2 , but shifted in phase, and 
mutatis mutandis the analogous statements hold for q( h ' 2 \x.) and q^ h ' 3 ^(x). If q(x) corresponds to a particular 
time-frequency structure as a signal in x\ for fixed values of x 2 then q^^'fa) will replicate the same structure, but 
shifted in phase, in analogue with equation ©. We propose to use the decomposition coefficients of the three HCTs 
of the observed image to estimate the decomposition of the deterministic image. An improvement in estimation 
will ensue if the magnitudes of the decomposition coefficients of the HCTs are large when the coefficient of the 
observed image should be kept rather than killed. Given each coefficient replicates the same variational structure 
in each separate axes this should be the case, cf equation ©. For simple 1-D structures such as line segments 
observed in 2-D the energy of the image will be spread over more coefficients than strictly necessary. By defining 
the additional images that should have the same marginal variational structure as the images in each of the two 
axes, for moderate SNRs the estimation should improve by using the additional components, as more often the 
signal is recognized as present. Subsequent risk calculations in section UlI-FI show that if the DWT of the quadrature 
components have the same mean as the DWT of the original signal, then the risk can be reduced by a new procedure 
that uses the magnitudes of all four components at each fixed £, to threshold a given coefficient. The authors of [15] 
denoised a 1-D signal by defining a second component as the HT of the observed signals, and using the DWT of 
this component when thresholding the observed signal. A naive 2-D extension of this method would define a single 
extra quadrature component corresponding to q( h & (x) , phase shifting in both spatial directions simultaneously. We 
discuss the risk of this procedure in section UlI-FI and it is shown to exceed that of the proposed method, for certain 
scenarios. 
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III. Estimation 

A. Defining Estimates 

We have argued that the quadrature components defined either by the HCTs or RTs have the same space and 
spatial frequency structure as the original image, shifted in phase. Therefore the mean of the DWTs of the quadrature 
components should be the same as the DWT of the signal. We define a local magnitude in terms of the DWTs of 
the full set of quadrature components. 

Definition 3.1 (The Magnitude of a Coefficient ): We define the magnitude of a coefficient using quadrature 

components denoted by s via: m! 9 '^ 2 = { } } Ylb=o wi q ' s ^ 2 . 

s C T +1 s 



Let for some fixed L\ € N B\ = {k : k = i + l, 1 = -L 1: . . . Li} , then M^ q,s)2 is a 2-D analogy to to 5? 
J2ieBi ^fip use< ^ anc ^ Silverman [10][p. 132], to block threshold. Each coefficient Wg will be estimated 

(Y s)2 

by hard thresholding the observed coefficient depending on the value of Mg ' 

OH M (Y,s)2 > *> ) 2 

The notation given £ is fixed when estimating any set coefficient is needlessly complicated, and we remove the refer- 



ence to most of these indices. We define: w( s > u ^ 



(»,0) w(Y,s,L) 



T 



W^' S '°\...,W^ S ' L) 



T 



note that /ii = /x[ s ' M ^ does not depend on the choice of s and we denote the estimator using the s indexed components 

by: ^' U) (A 2 ) = W^ q ' s) (X 2 ). 

B. Magnitude of Typical Image Features 

Deterministic images are frequently modelled as the combination of texture and contours (see for example work 
by Vese and Osher [22] modelling a signal as a bounded variation contribution plus a texture contribution). We 
consider observing an image that is an aggregation of edges and texture, where each texture component is modelled 
by t(x) = at(xo) cos(2-7rf,^x), and each edge component is modelled by e(x) = a e (xi)5(cos(9)xi + sin(#)x2 — c). 
a e (-) and at(-) are assumed to be slowly varying. In general we do not expect to observe sinusoids or discontinuities 
that for very slowly varying a e (-) span the entire observed image, but to be able to carry out theoretical calculations 
stylized image structures must be analysed, that observed images would approximate. We shall investigate how the 
magnitude of the transform coefficients of the full set of quadrature components behave, as our subsequent risk 
calculations will demonstrate that the success of the method strongly depends on the mean of the quadrature 
components. 

To this purpose we define the Maximum Overlap Discrete Wavelet Transform Coefficients (MODWT coefficients) 
W^' s ' l) . These are the DWT coefficients calculated without subsampling, and where a new normalisation is 
introduced at each level j to preserve energy. For a full length discussion see Percival and Walden [18][Ch. 4]. 
We denote the FT of the MODWT filter h j>u>k by H jtV (f) = H jtV (f) e -2^,4f) ; thus defining the modulus 

(\Hj iU (f)\) and phase (<^, )U (f)) of Hj >u (f). For notational convenience let x = 2 jf (k + 1) — 1, and the region of 
frequency space where Hj )U (f) is mainly supported be denoted Qj >u . The DWT coefficients of a generic signal g(x) 
can be extracted from the MODWT coefficients of <?(x), using the relations = ^'W^* x (see for example 

Percival and Walden [18][p. 203]). 

Lemma 1 (RT Magnitude of Local Oscillation ): If the signal locally takes the form 
t(x) = at(xp) cos(27rf^x) with fo = [/ocos(0o) /o sin(0o)] T , then the magnitude of the wavelet decomposition 
defined by definition 13. II is given by: 

M {t,r)2 = 2 2i-l a 2 (xo)J(foe ^. u)+pi + /M ) (15) 

where p\ , is an error term depending on the leakage of the wavelet filters in the frequency domain. If a sufficiently 
long wavelet filter is used, this term can be ignored. See Nielsen [23] for more discussion on avoiding leakage. 
Proof: See appendix B. ■ 
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Lemma 2 (HCT Magnitude of Local Oscillation ): If the signal locally takes the form 
i(x) = at(xo) cos(27rfJ"x) with fo = [/o cos(</>o) /o sin(0o)] T , then the magnitude defined in definition 13.11 is 
given by: 

M (U)2 = 2 2,-l a 2 (xo)J(foe ^ >)+p2 + ^y (16) 

where p2, is an error term depending on the leakage of the wavelet filters in the frequency domain. 

Proof: See appendix B. ■ 
Lemma 3 (RT Magnitude of Discontinuity ): If the signal locally can be approximated by 
e(x) = a e (xi)5(cos(9)xi + s'm(9)x2 — c), then the magnitude defined in definition 13. II is given by: 

= 2 j A e {0) J" 1 '"™ Hj^cos^ux, sin(0)iyi) ^-^(cosW^+sin^-c) ^ +p3 + 
= 2U e (0)(-i) r i '™\gn{v l )H j)U {cos{e)v l ,sm{9)v l )e 2 ™^^^ 

J fl.min 

+P4 + o(l), M^ = \w^ + \u^ + P , + 0{^) (17) 

where p^, p^ and p^ depend on the smoothness of a e (-), whilst v-\ jTn - m and ^i,max are given in appendix B. 

Proof: See appendix B. ■ 
Lemma 4 (HCT Magnitude of Discontinuity ): If the signal locally can be approximated as a discontinuity 
e(x) = a e (xi)5(cos(9)xi + sin(#)x2 — c), then the magnitude defined in definition 13. II is given by: 

Mg*» = + + P6 + O (1) , (18) 



where p$ depends on the smoothness of a e (-), and the forms of and Ug are given in lemmaQ] 

Proof: See appendix B. ■ 
For oscillatory signals the magnitude hence aptly reflects signal presence at £. Equation (fTTl illustrates the problem 
experienced by an edge in a 2-D separable representation: only if 8 = or 9 = ir/2m for m € Z will the edge 
live in 0^2 or %,3, i.e. constant in one direction and variable in the other. Only in this case will the representation 
be extremely compressed (note that the proof needs to be adjusted for 9 = 0). From equation dl7l l we note 
that the compact spatial support of ensures that the energy of U^ e) and W^ e) are mainly concentrated near 
cos(6>)xi + sin(6>)x2 = c. Given we may represent Hj jU (-) in terms of a magnitude and a phase, the difference 
between <pj u (cos(9)u, sm(0)v) and 27r^(cos(#)xi + sin(#)x2 — c) will determine exactly at which spatial indices 

(e) ' (e) 

Wf and Uf have non-negligible magnitudes (see for example the discussion in Gopinath [24] [p. 1794]). Thus 
the DWT of the quadrature components will be large near the discontinuity and they can be used to improve the 
estimation. Our proposed procedure will capitalise on this fact, and it can be noticed in the reconstructions that 
line discontinuities are better reconstructed (see Figure ^ (c)), and as a curved discontinuity can be approximated 
as the aggregation of amplitude modulated line discontinuities, improvements in estimation can be observed for 
curved structures (see Figure ^ (g))- 



C. Distribution of Noise & Universal Thresholds 

The distribution of n^ s ' u ^ must be determined, to obtain a universal threshold [4]. Let K = N 2 be the to- 
tal number of coefficients of the original observed image and denote M^' max) = max 4 (c[ s) + l)M^ )2 /a 2 . 
Downie and Silverman [4] proposed that a universal threshold should in general satisfy taking a value such that 
lim^-i-ooP ^A/j^' max ^ < Xj^j = C, for some constant < C < 1, and as K increases the expected number of 
coefficients exceeding the threshold is some small but finite non-zero value. We choose a slightly more conservative 
threshold, so that if our strategy were to be used for K independent threshold criteria, then the probability that 
the maximum exceeded the threshold is O ( jj^rgjjsji ) > rather than tending to a positive constant for the Riesz 
threshold. We use for the Hypercomplex threshold a conservative version of that suggested by Downie and Silverman. 
We cannot quite achieve analogous results to Downie and Silverman as as the set of DWT coefficients n^'^ 
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are correlated across indices £, and adopt arguments similar to those given by Johnstone and Silverman [25] 
and Olhede & Walden [15], to justify the choice of threshold. We do not aim to determine the full covariance 
structure of the full set of wavelet coefficients of the observed image and the quadrature components. To derive the 
conservative threshold define M.^ as the maximum of K independent variates with the same marginal distribution 
as . We find a universal threshold based on determining the distribution of M$ > an( * this 

then constitutes a conservative choice for M^' max ' ) = max^C^ + 1)M^ /a 2 , s = r, h as 



?(M^ max) <Af) = P 



nf =1 



= P[M ( $<\ 



M. 



(s)2 



o*/(cW + 1 



< A 



(«)2 
K 



K 



> 



IP 



M. 



{8)2 



o*/(CP + 1 



< A 



(s)2 
K 



K 



is) (s)2 

by corollary 2 from Dykstra [26]. C L + 1 can be interpreted as the degrees of freedom associated with ikQ . 



As 



shown in the subsequent section, the DWT of the quadrature components for any fixed value of £ are uncorrected, 
and the A; L matrices of Dykstra are defined to take the variance of nf ,u \ into account. 



D. Distribution of the Magnitude 

var(n| s ' M ^) must be determined for s = r, h to derive the approximate distribution of n( s ' M ). We denote by = 
as equality in law [27]. 

Lemma 5 (Distribution of Riesz Coefficients ): The DWT coefficients of the original signal and the RTs of Gaus- 
sian white noise are distributed as: 



n (r,u) g zir , u ) + Q (1/Ar) ^ z ( r ,u) ^ ^ ^ ^2^ ^ a (r,u) j _ (r,u) 



u= 1,2,3,4. (19) 



where diag denotes a diagonal square matrix, a^ r ' u ^ = \, u = 1, 4, a^' 2 ) = \ + 2 tan 1 (i) + \ tan 1 (2) and 
a (n3) = 1-2 tan^ 1 (I) - I tan" 1 (2) . 

Proof: For the proof see Appendix C. ■ 
Lemma 6 (Distribution of Riesz Magnitude ): The magnitude square of the DWT of the RTs of Gaussian white 

(e r)2 

noise, denoted ' , are distributed as 



M^' r)2 c Z (r,u)T Z (r,u) 
= V 2 



+ O (l/N) ^T 1 + (l/N) , Tx ~ x\ + a (r ' M) X? + (1 - a^)xl (20) 



j = 1, . . . , J, fei, ^2 = 0, . . . , Nj, u = 1, . . . , 4, where if u = 1, 4, Ti /las distribution 

rr 

f Ti (t)=e~ t ^= f 2 e"' 2 ^, (21) 
V 71 " Jo 

whilst if u = 2, 3 f/ie moment generating function of T\ is readily calculable, and we may calculate the probability 
of obtaining large variates using formulae derived by [28]. 

Proof: For the proof see Appendix C. ■ 
Lemma 7 (Distribution of HCT Coefficients ): The DWT coefficients of the HCT of Gaussian white noise are 
distributed as 

n (h,u) £ Z ^ + 0(l/N), Z^> u ) ~ 7V 4 (0, a 2 diag [1 1 1 1]) , u = 1, 2, 3, 4. (22) 
Proof: For the proof see Appendix C. ■ 
Given the approximate joint distribution of the DWT coefficients at £ has been determined, it trivially follows 
that the magnitude is distributed as 

Mi € ' k) r Z( h > u ') T Z( h > u ) r 

" - " - + 0(1/N)±T 2 + 0(1/N), T 2 ~x\ + Xl + Xl + Xl~xl 



a 2 /A a 



2 
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E. Threshold Choice 

Lemma 8 (Riesz Conservative Threshold ): Taking 

A£ )2 (Z) = 2 log (K) + 2Clog (log (K)) , (23) 

it follows that if C > — 1, 

P (M$ < \<g\lj) 1. (24) 

Proof: For the proof see Appendix C. ■ 

From [4] we may note that the RT threshold is thus like that of a xl (C > —1) however to ensure that the 
probability tends to 1 rather than a fixed constant we take C = 0, rather than C = — 1. Given the normalised 

marginal magnitudes of the HT components are x\ we ma y use results of [4] to note that 

=21og(iO + 21og(log(iO)- 

gives an appropriate threshold. Note that yet again, we expect this to be a conservative threshold, because the 
wavelet coefficients will be correlated across As a final step of the procedure we implement cycle-spinning 
[18, p. 429], which is known to improve mean square error results considerably. Finally for completeness consider 
implementing hard thresholding in the usual fashion: this will be denoted by s = c, and we discuss using a single 
added extra component of n^' u ^ when thresholding n^ 1 as a naive extension of 'analytic' thresholding, denoted 
by taking s = a. 



F. Risk Calculations 

To compare the theoretical performance of the threshold estimators proposed in this paper, we calculate the 
standardized mean square risk at any fixed value of We define the standardized risk using any threshold procedure 
denote by s for 9i = jo by 



Rf{\) =a- 2 E 



(25) 



If s = r then we denote by t\ and r 2 the two different cases that may occur at a given £ when u = l,4oru = 2,3 
- the risk will be different in these two cases. This will not happen for s = c or s = h. For completeness we here 
also provide the risk of the 'analytic' denoising, as this was not done in Olhede & Walden [15] and corresponds 
to a special case of the risks determined by Fryzlewicz [14]. 

Theorem 1 (The Risk of a Thresholded Coefficient ): The standardized risk of an individual coefficient is using 
threshold strategy s = c, a, r, h with 9i = /4 jo and Rj(X) = J2i( w i + ®i) 2 < ^ 2 given by: 

R<f\\) = 1 + f [9l- w 2 } <f>(w) dw, R { e a \\) = l+[ [e\- w 2 } <p( Wl )4>{w 2 ) d 2 w 

R ( p{\) = 1 + ———L=== [ [e 2 x - w 2 ] <f>(wi)<f>(w2/Vrfrt)<f>(w3/Vi - a(^)) d 3 w 
y/a( r < u )(l - a( r >")) Jr 3 (\) 

R { h \\) = l+f [el-w 2 ]<t>twi)<j>tw2)<t>{wz)<t>{wt)d*w. (26) 
Jr } {X) 

Proof: The risk of an individual coefficient using standard hard thresholding has been noted by Marion et al. 
[29], whilst the risk of the hyperanalytic thresholds are derived in appendix D, and s = a is a special case of the 
bi-variate thresholding investigated by Fryzlewicz [14]. ■ 
For some examples of signal/noise distributions, the individual risk of a given coefficient is plotted in Figure |2] 
for the four estimation procedures using the universal threshold. Figure |2] (a) shows the reduced risk of 'analytic' 
thresholding compared to regular thresholding in 1-D when the means of the wavelet coefficient of the signal and of 
the HT of the signal are equal, and this then provides theoretical justification for the 'analytic' denoising procedure. 
Figures |2] (b), (c) and (d) show the risk associated with thresholding at any index £ using either the usual hard 
thresholding (c), 'analytic' denoising (a), Riesz denoising when u= 1,4 (r±) and u = 2,3 0"2) or Hypercomplex 
denoising (h). The risk is calculated with K = 256 2 and using the universal threshold. If the mean of the DWT of 
the quadrature components is of similar magnitude to the DWT of the signal then the risk is reduced. The greatest 
weakness of the proposed methods is if the means of the quadrature components are completely disparate from 
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that of the original signal, as may be noted from figure (c). The results of section IIII-BI indicate that this will 
not be the case for typical image features. The norm of a signal and its HT are identical, and given the results of 
section Ull-B I the means are unlikely to be consistently mismatched. Finally if there is no signal present we observe 
the following result. 

Corollary 1 (The Risk of a Thresholded Coefficient when there is no signal ): The risk of an individual coeffi- 
cient is using threshold strategy s = c, a, r, h with 0[+i = for I = 0, . . . , L given by: 



K'W = 1 " 7 ( 2> 2 A > R ° W = ( 1 + 2 A ' R oW = e 2 { 1 + 2 A + 5* 



1, 4. (27) 



4 r) (A) = 1 + 7(^A 2 )-2 7 Q, A 2 )+4A 2 [ci>(V2A)-3>(A)\ 

Proof: See appendix D. We denote by 7(0, x) = Jq s a ~ 1 e~ s ds. ■ 
As the representation of the image will be sparse the risk if no signal is present is important. From the corollary, 
and the asymptotic forms in A given in appendix D, we may note that the risk at the universal threshold when 
there is no signal present is of the same order for ^ c) (^21og(K)) and i?f l) ( v / 21og(i^)) even if the coefficient 
differs in favour of flf log (if)) whilst R { Q h) (^2log(K log(K))) and 

R^\y/2log(K)) correspond to different orders. The thresholds were introduced to improve the estimation of 



signals that were slightly more spread across coefficients than strictly necessary, but the risk for any coefficient 
when no signal is present is of similar enough nature to make the difference in estimation negligible (i.e. O ^ lo s^") j 

rather than O (^ xiog(K) ) • ^ ne exam pl es wm substantiate this claim. 

IV. Examples 

To examine the properties of the proposed methods, we have implemented simulation studies on images that 
can be retrieved at http://sipi.usc.edu/database/ (Tiffany and Boat), whilst (Lenna and MRIScan) 
are downloaded from http://www-stat.stanford.edu/~wavelab/. We used LA wavelets length 8. 
To compare our results, similarly to [5], we also implemented usual hard thresholding and the Wavelet-domain 
Hidden Markov Model (HMM method) proposed by the Rice group [13], where the software is available at 
http : //www-dsp . rice . edu/ software/, denoted by s = hmm. We used the code hdenoise.m with default 
settings, and daubcqf(8,' min'). We implemented the method at several Signal-to-Noise Ratio (SNRs) of 2 (very 
noisy), 4 and 8 (quite clean), with a set of images, i.e. Lenna (512 x 512 version), Boat (512 x 512), MRIScan 
(256 x 256) and the second channel of the colour image of Tiffany (512 x 512). The SNR is (as usual) given 
by SNR 2 = jv^EE^x- Table |U shows the result over repeated simulations. Reduced Mean Square Errors 
MSEs, and increased Peak Signal to Noise Ratios PSNRs (using the definition of [30]), are observed when using 
the proposed method with either hyperanalytic threshold criterion, and the reduction in MSE is of a respectable 
magnitude compared to variation across replications as the estimated standard deviations in the MSEs are usually 
considerably smaller. Overall the hypercomplex thresholding procedure is outperforming the Riesz thresholding as 
well as the other methods, apart from the boat image at high SNRs where the HMM does better. The Hypercomplex 
method is expected to outperform the Riesz method from the risk calculations, but not perhaps from our discussion 
in sections III-DI and III-EI The Riesz transform may appear more useful as it determines the prevalent direction 
from the image, whilst the Hypercomplex transform simply decreases the risk in estimation by considering variation 
associated with the same time-frequency (i.e. 1-D) behaviour in both axes separately, with the second variable treated 
as fixed. However, whilst the Riesz transform is suitable to use on locally unidirectional structure as discussed in 
section III-DI the hypercomplex transform treats variation in both axes, and images quite frequently have multi- 
directional variations present even locally. Some additional analysis of images has also been implemented in [31]. 
Consider two cuts from reconstructions to further elucidate on these results: see Figure ^ (a)-(h). We show the 
hypercomplex reconstructions only as the Riesz and hypercomplex reconstructions are similar. Clearly both [2(c) is 
more connected than (b), as is (g) to (f), whilst (d) and (h) has much remaining noise to preserve more detail. The 
proposed method performs quite well. In addition to the SNR's chosen for the full range of images, we implemented 
the procedure at the SNR chosen by Starck et al. [5], namely adding Gaussian noise with a standard deviation of 
20 to the raw Lenna image, or using a SNR of 5.58. The PSNR we observed in the noisy image (21.58) is less 
than theirs (22.13) but that is to be expected in noisy replications. We found for the methods tested in this paper 
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Fig. 2. The risk of hard thresholding compared to 'analytic' hard thresholding (a) where the standardised mean of the coefficient is 
denoted 9 = |#|cos(</>), where <f> = n/4. The risk associated with a thresholded coefficient using standard hard thresholding (solid line), 
the analytic threshol d (crosses), the Riesz thresholds (dotted line and dashed line) or the Hypercomplex threshold (dash-dotted line). In plot 
(b) 6»i = sf(f\ + 9j = \9\ /a/2 for the Riesz threshold whilst 6>i = 9 2 = 9 3 = 9 4 = \9\ /\/2, for the Hypercomplex threshold. In plot (c) 
92 = #3 = for the Riesz threshold whilst 9\ — #2 and #3 = #4 = for the Hypercomplex threshold. In plot (d) 9\ = \9\ cos(37r/8) 
#2 = 63 = \9\ sin(37r/8)/\/2 for the Riesz threshold whilst 2 = 0i = \0\ cos(3tt/8) and 9 3 = 9 4 = \0\ sin(37r/8) for the Hypercomplex 
threshold. For the 'analytic' procedure we use #4 as the second component. 



that averaged over 100 replications s = c (29.22), s = r (30.12), s = h (30.93) and s = hmm (30.48), where they 
obtained for s = c (28.35) and s = hmm (30.80). Starck obtained PSNRs between 29.99 to 31.95 by using local 
ridgelets and curvelets. Clearly the hypercomplex denoising performs on par with the algorithms suggested, and in 
addition the proposed procedure is both cheap to implement and extremely simple to code. 

TABLE I 

The average results over 50 runs. The symmlet wavelets (or LA wavelets were used) and J = 3. 



Example (SNR) 


Boat (2/4/8) 


Lena (2/4/8) 


Tiffany (2/4/8) 


MRIScan (2/4/8) 


average MSE (n)xlCT s 
sdMSE (n) xl(T 10 
PSNR (n) 


95.4/23.8/6.0 
26.3/6.6/1.6 
11.36/17.38/23.40 


95.4/23.8/6.0 
26.3/6.6/1.6 
10.67/16.69/22.71 


95.4/23.8/6.0 
26.3/6.6/1.6 
7.65/13.67/19.69 


381.5/95.4/23.8 
187.9/47.0/11.7 
16.96/22.98/29.00 


average MSE (c)xl(T s 
sdMSE (c) xl(T 10 
PSNR (c) 


8.0/4.4/2.2 
6.7/3.2/1.2 
22.12/24.68/27.70 


5.16/2.6/1.1 
7.3/2.3/0.8 
23.34/26.34/29.88 


3.8/2.2/1.0 
4.9/2.5/0.8 
21.64/24.02/27.39 


56.9/24.5/9.1 
110.0/43.4/15.3 
25.22/28.88/33.18 


average MSE (r)xl(T s 
sd MSE (r) xl(T 10 
PSNR (r) 


7.1/3.8/1.8 
6.2/2.7/1.0 
22.63/25.39/28.49 


4.5/2.2/0.93 
5.6/2.1/0.7 
23.96/27.09/30.77 


3.4/1.9/0.8 
3.8/2.4/0.7 
22.09/24.73/28.00 


46.9/19.7/7.3 
89.2/36.9/11.6 
26.06/29.82/34.17 


average MSE (h)xlCT 8 
sd MSE (h) xl(T 10 
PSNR (h) 


6.3/3.2/1.5 
6.6/2.2/0.9 
23.14/ 26.04/29.2 


3.9/1.9/0.77 

4.6/2.0/0.6 
24.56/27.74/31.58 


3.1/1.7/0.8 

3.3/2.3/0.6 
22.48/25.06/28.55 


40.4/16.6/6.1 

79.1/31.0/8.7 
26.71/30.57/34.92 


average MSE (hmm)xlCT 8 
sd MSE (hmm) xHT 10 
PSNR (hmm) 


6.7/3.0/1.4 

11.3/1.8/0.7 
22.90/26.38/29.62 


6.3/2.1/ 0.85 
16.0/2.3/0.5 
22.48/27.31/31.19 


5.4/1.8/0.8 
16.4/1.2/2.7 
20.12/24.78/28.26 


48.9/20.9/7.5 
82.1/56.9/7.4 
25.88/29.57/34.01 



V. CONCLUSIONS 

This paper has proposed a new thresholding strategy for estimating decomposition coefficients, and has in 
particular implemented the strategy with the discrete separable DWT. We have determined the stochastic properties 
of the decomposition of the noise, and both the deterministic (for some stylized image features) and stochastic 
properties of the suggested new thresholding criterion. We established universal thresholds. We calculated the 
risk theoretically, and for some specific choices of the mean provided plots of the risk showing that the proposed 
methods outperform standard denoising theoretically. We implemented the procedure on several examples at several 
SNRs, comparing the methods with the Hidden-Markov-Model used by the Rice group as well as standard hard 
thresholding, and found that the proposed algorithms offered improvements in most cases. Given the simplicity 
in implementation, and visually pleasing reconstructions, hyperanalytic denoising methods offer a computationally 
cheap improvement to existing methodology, as well as offers insight into 2-D variational structure. 
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Appendix 
A: Digital Implementation 



by: 



For future reference the Discrete Fourier Transform (DFT) and its inverse are given with fi = [— t^-, 2 



N-l N-l 



Q D (h,f 2 )=Ax 2 Y^ ^9(xiAx,x 2 Ax) e - MT)[Al , q x = ft Q D (i)e 2 '^ dH. 

11=0x2=0 J ^ n 



(28) 



N- 



Note that the value of Q D (/i,/ 2 ) at frequencies /1 G ^Sr, ■ ■ ■ ■ TTT 

AT/2 N/2-1 N/2-2 
' NAx ' NAx ' .VA.r ' • • • J 



1} is equivalent to the 
} , and the equivalent statement 

Q D (f)^(f)e 2 ^* d 2 f = 1, £ E + O ( i ) , 



value of Qd (fi, / 2 ) at frequencies /1 G | 
mutatis mutandis hold for / 2 . Let Qd,u = (a^o jv%i-) 



Mi=0 u 2 =0 



s = r, h, and 2 = 1, . . . , L, where we define: 



Ul /(NAx) 



V, 



(r,l) 



D.u 



iyJ{u l /(NAx)Y+(l 2 /{NAx)Y 
(N- Ul )/(NAx) 

i^/({N-u 1 )/NAx) 2 + (u 2 /NAx) 2 
Ul /(NAx) 

i\/(u 1 /(NAx)) 2 +((N-u 2 )/(NAx)) 2 
(N—ui)/(NAx) 

k i^/((N- Ul )/NAx) 2 +{{N-u 2 )/NAx) 2 



if m = 

if ui = 1,... ,N/2, u 2 = 1,... ,N/2 

if Ul = N/2 + 1,... ,N- 1, u 2 = 1,... ,N/2 

if m = 1, . . . , JV/2, u 2 = AT/2 + 1, . . . , N - 1 

if Ul = N/2 + 1,... ,N- 1, u 2 = 1,... ,iV/2 



We define in an analogous fashion. The digital definition of the filters corresponding to the three Hypercom- 

plex components are defined via: 



V, 



(h,l) 



if ui = 0, iV/2 

-i if ui = I,..., N/2 - 1 ,1 = 1,2, 

1 if uj = JV/2 + l,...,JV-l 



and = Vd'^Vdu- Implementing the discrete HCT introduces an error term of O (1/N) . 

B: DWT of Typical Deterministic Image Features 
We use period boundary treatment (see Mallat [17] [p. 282-292]) when implementing the MODWT, so that with 

£x = [j,u,x\ T : 



(<?) 



W. 



Proof of Lemma [7] 

By direct calculation using equation d29l i: 

o*(xo) 



ff ijU (f)Q D (f)e 2 ™ f2x dH. 



ft 



(29) 



m _ f) + gfa + f )) F . )u(f)e 2^x d 2 f + Q 

1 



2^(x ) F i)U (f ) cos(2vr(f J x - ^>(f ))) +0[- 



(r,u) 
U 2 



and similarly u^'^ = 2 jf ai(xo) Hj iU (fo) sin(^o) sin(27r(f0 , x — ^- )U (fo))) + O (i) , whilst p\ is introduced when 



2> 



a(x ) 



i)sga(/i)^ (<J(f - f) + <5(f + f)) ^ iU (f)e 2i7rfTx d 2 f 



f2 



i-O I ^ ] = 2 J a t (x ) ^>(fo) cos(0 Q )sm(27r(f o J x-^ > (f o ))) + O 



is approximated by an exact band-pass structure. 
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Proof of Lemma |2 

By direct calculation, using equation (I2UI 1. it follows that: 

$' U) = Vjj ^(-tjBgnC/O («5(f - f) + <5(f + f)) ^, u (f)e 2 - frx dH 
+0 (J^j = 2% t (x ) |^>(f )| sin(2^(f T x - ^, u (f ))) + O (J^J . 

Similarly pf' u) = 2% 4 (x ) #,>(fo) sin(2vr(f ( f x - <£,>(fo))) + O (j?) and 
$ ' u) = 2^a i (x ) |^>(fo)| cos(2vr(f T x - ^, u (f ))) + O (£) . 

Proof of Lemma 

Assume for simplicity of exposition, < # < 7r/2, but with the necessary notational changes no such restriction 
needs to be made. Define the rotation matrix by rg = [cos(0) — sin(0), sin(0) cos(0)] , and the change of variable 
given by: v = f(— 9) = r_gf. We assume that A e {f) decays for large frequencies, and consider A e (f) = 
A e (0)5(f - 0). For example if a e (x) = I (x £ [0,L)) /y/Z then A e (f) = e ~"'l^ fL) , that as L -> oo 

A e (f) will concentrate to / = 0. E(f) = A ' ff^ ^ e~ 27Tic ^/ sin W , and with O(0) the rotated by 6> 
version of fi, with i^ min = min^ v = sec(0)fil (|tan(0)/i| < ^) J(|/i| < 335) and ^i, max = max v = 
sec(0)/il (|tan(0)/i| <^)/ (|/i| < ^) , then 

Mi = V [ [ g,- B (f) ^ WV^^ e- 2 ^/^) e 2 "^' d'f + Q f i) 
V ' |sm(6»)| V^y 

= 11 H u ( T ._ tfI ,.) 2JAe (~^/ Sin ( g )) c -2^(fain(g)^+co S fg)i/ 3 k/sinW-i/ r r- < ,x) ^ p |" M 

7 7n(«) J ' U |sin(6>) | ' V^V 

( = } 2^ e (0) ^ l maX A}, „(cos(0K siii(e)^) gWcosW^+sinW^-c) ^ + p3 + ^ 

The approximation in (1) relies on A e {-) taking the form of a 5 distribution contribution, i.e. a e {-) constant over 
a large spatial domain, but a slowly varying a e (-) will approximately yield the same result. Also we may find 
approximate descriptions for decomposition of the RTs, namely with / = yj f 2 + /| and v = \Jv\ + v\ : 

>,«) = 2 i /" f E iu {f){-i)hlf - b^Zg/Mg)) e -=hrfc/,/Bin(g) e 2^x d 2 f 

7 Ai ' |sin(0)| 
+0 ^-ij ( = } ,4 e (0)2 J '(-i)cos(#) ^" 1,m " iT jiU (cos(%i,sin(%i) e ^(c«(*)*i-Mi»W*,-c) 

xsgnOx) + ^ + O (1) = cos(0)^ e) + p' A + O (J-) . (30) 

The approximation in (1) relies on the envelope being constant in the spatial domain - a slowly varying envelope 
will thus only approximately yield the same value, and this introduces an error term p' 4 . Similarly it transpires that 
^(r> u ) = sin(6)U^ + p-j + O (-^) , and thus the result follows, with a new error term p$. 

Proof of Lemma 

Also we may find approximate descriptions for the Hypercomplex components, namely: 

(h,u) = 2j r r k . (f)H)sgn(/l) A e (-h(9)/M-e)) e _ 2nicf2/sin{e) e2nifTx dH 

J Jo. |sm(0)| 
+0 (J^j ^ 2*(±l)(-t) J" 1 '™* H jtU {cos{e)v x ,sin{e)u 1 )e- 2 ' Kicv ' e 27rw 1 (^(e)x 1 + S m(e)x 2 ) 

xsgn (u!) du! + p s +0 (^) = (±)U^ +p 8 + (J^j , (31) 

where the value of ± depends on the value of 8. Similarly it transpires that = + pg + O (-^) and 

/4 = ±W^ + pio + O (77) • Ps, P9 and pw are constants depending on the variability of a e (x). 
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C: Statistics of the Normal Vector 

For simplicity set Ax = 1 when deriving the statistical properties of the coefficients. e x has a spectral represen- 
tation: e x = J n dZ e (i)e 2i7TfTx , where Z e (f) is a complex-valued orthogonal increment process, see [32] [p. 244], 
i.e. E (dZ e (f)dZ*(f)) = if f ^ f . The DWT is represented by subsampling the MODWT: 



n 



(s,u) 



w. 



J • • • ! " ' 



(e,s,L) 



noting that: 



COV |„Wl.«) „('W 



2 2 W(n Wl ' n) ,n (s ' /2 ' n) 



(32) 



We have that n[ s ' u) = J n fl^jf) dZ £ (f)e 2 * tf!r *, H jA (f) = H^H^h), H jt2 (f) = ^•(/ 1 )G i (/ 2 ) ; fl^ (f) = 
Gj(fi)Hj(f2) and .£^4 (f) = Gj(f\)Gj(f2). We approximate the magnitude of the wavelet filters as exact bandpass 
filters - see for example Nielsen [23] for a discussion of such approximations, and optimal filters to use. That is: 

2 _ / ^ ^ l/l ^ [25+1' 23") rWf N | 2 =/^ ^ ^ ^ ( — 2J+ 1 "' 2J+ 1 ") 

if |/|€ [s^r.i), ' jUj I if |/| € [3^, J) . 



^ (/) 



Var (n^) = 2 2 ^ (J j j j H hU (f) dZ^e^^H^ (f) ^*(f')) = a 2 



Proof of Propositions [7] <& HI 



(33) 



var n 



4 J 



2 2 ^^y y yy ^^(o^jfOe^^-^^^K^^Kf) 



2 2i 



V^' l \i) H j>u (f) 



2 2 i / ^C«.9(f) 2 Hj, u (f) 



d 2 f 



2 dH + o(±)^a, 



(s,u) 



The latter defining a, , to be explicitly determined for s = r, h and u = 1, 2, 3, 4. For /1 7^ /2, where l\, I2 7^ 0, 
we determine that: 

2 



, (s,u) (s.u) 

cov I n ; , nl 2 



dH 



^ [ a 2 V^ h \i)V^ M *(i) H j>u (f) 

^EE«*+o(^)=»(s 



(34) 



by property ^ as ^ is separable. Therefore 



E E 



n 



(s,u)2 



22j / Ek (s,i) (f) 

2^4V|^ iU (f)| 2 d 2 f + o(l) 
C^ar (n^> 2 ) + o(l 



(35) 

Hence the total energy of the noise associated with the total magnitude square of the added quadrature components 
is a constant times the variance of the original signal. 



(s.u) (s.u) 
cov I nl n\ 



22jE {l LI L Hj ' um ^ {f,)dz " {f) 

dZ* e (f')e 2ni{f - f ' )x ^ = y a 2 V (s > l \¥)H jtU {{)H* :U {f) d 2 i 

«(*)=£>W+<>(*) =0 (* 
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Proof of Lemma \5\ 

Given the noise was Gaussian and zero-mean we only need to determine the second order structure to deduce 
the Lemma, using equation d32t . 



Var 



H* (f[) H* (&) dZ e {?) 



fx 



+ o 



N 



2 2 V 



/f Hj (h) 2 Hj (/ 2 ) 



ft + fi 



2 ^ + o(Ly 



Var ( n 



Hence it follows that 



(T 



/I 



ffj (/l) (/2 



Var ( n 



+ Var ( n 



Var 



and the two variances are obviously equal. Also: 



Var I n. 



(r,u) 



+ Var ( n 



(r,u) 



Var ( n 



Xr,u) 

h j i 



u = 1,2,3,4, 



(36) 



(37) 



Var ( n 



2 2j+2 a 2 



/i 



tan 



/f + fl 



df 2 dh + o[- 



h 



dfi + O 



N 



a 



I - 1 to °" <2 > + \ ta,r 1 G) - (i - \ < 2) + i tan " (CO) 



v 



v 



Var ( n 



(r,u) 



Var ( n. 



,(r,2) 



Finally note that: 



Var I n 



,(r,3) 



Var ( n 



,(r>M) 



tan 



f 2 / 1 



+ o 



fi + fi 
i 

v 



- 2 ' - + 2tan~ 1 (- 



a 2 a^ + 



0.8737cr 2 . 



= a 2 a (r ' 3) « 0.1263a 2 , 



A 2 



/i 2 + /I 



dh dfi + O 



N 



a 2 [l 



>,2) 



+ o 



N 



Clearly we can find the variance of the second RT by permuting the order of the spatial variable in the integration, 
and this then completes the variance calculations. From the proofs of propositions Q and © we can note that 
the components of n( r ' n ) for u = 1, 2, 3, 4 are uncorrected up to O (i) , and this can also be shown by 
direct calculation, mutatis mutandis the calculations given above. Thus as e x was zero-mean Gaussian, and we are 
forming linear combinations to obtain n^ r ' u \ the stated result follows from the expressions for the covariances of 
the components. 
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Proof of Lemma |6| 

a) First consider u = 1, 4 so that the variance of the two Riesz components is 1/2. Then by Lemma [5] it follows 
directly that (c[ s) + l)M^' r)2 = Z^ T Z^ + (■£.) , T x = Z^ T Z^ ~ Xi + 3X1 + 2X1- ?i has a Moment 



Generating Function (MGF) given by (s) 
that Tx does not exceed A 2 is given by 

P (Ti < A 2 ) = 1 + 



s/T=2s 1-s' 



, and thus f Tl {t) = e~*-^ J - / 2 e" 2 d«. The probability 



2 2 M 2 J 

— =e (in 

/7T 



2tt 



x 2 e 2 dx 



(38) 



where the cdf be found in Grad & Solomon [28][p. 472], and the function is expanded as A — > oo. 

b) Consider now u = 2, 3. Wlog assume that a^ r ' u ^ > 1/2, and otherwise relabel a( r ' M ) and 1 — a^ r ' u ^ suitably. 
If a ^ = 1 - a( r -") this collapses to the case given above. Define T = T x /2 so that T = £)f =1 a^X 2 where the 
Xj are iid Gaussian random variates with zero mean and unit variance, where Ya=i a « = 1- Then the MGF is by 
Grad & Solomon [28] 

i i 3 



M T ,(s)=M Tl (s/2) 



1 - 2(1 



Jr,u)< 



s/2 



1 - 2a (r ' u h/2) ~ 2 (1 - 2s/2)~* = ]J (1 - 2a { a)~3 , 



with oi = (1 - a( r >"))/2, a 2 = /2, a 3 = 1/2. Defining ci = 2/(1 



z=i 



C2 



and hence as we assumed 1 > a^ r,u > > 1 



2/a 



(r,u) 



C3 



we may note that ci > C2 > C3 and thus agrees with [28] 's 



notation. For future reference note that c\ + C2 = 2/(a( r ' M )(l — a^ r ' u ^)), u = 1,...,4, and we may rewrite 
Mt'(s) = n|=i (1 ~~ 2s/cj)~ 1 ^ 2 . Using results from Grad and Solomon we may determine: 



F Tl (t) = F T , (t/2) 



ofl) 



For suitably defined constant A From these formulae we can thus consider the probability of an observation 
exceeding a large threshold, which will be necessary for the selection of an appropriate threshold. 



Proof of Lemma [7| 
We establish 



Var ( n 



,(M) 



Var ( n 



(h,u) 
l l 



u 



1,2,3,4, I = 1,2,3,4. 



These results follow trivially from the form of the partial HT [21]. From the proofs of propositions (Q and © we 
can note that the components of n^ h ' u ^ for a fixed value of u = 1, 2, 3, 4 are uncorrected up to O (i) , and this 
can also be shown by direct calculation mutatis mutandis the calculations given above. 

Proof of Lemma |6| 

K wavelet coefficients will be thresholded where K\ magnitudes have the distribution given when u = 1,4 and 
K2 have the distribution that follows from u = 2,3, where K\ + K2 = K, and K\, K2, K3 = O(K), so that by 
Dykstra [26] 

lK 3 



P {M K < \ 2 K ) 



1 



7T \ K 




1 



^ e -A 2 K/2 
A_ftT 



ignoring o(l) terms in X for suitably chosen constant A2, if Xk = O (log [If]). Thus with X 2 K = 2 log [if] + 
C 2 log [log [if]], 

K 3 



P (M K < \ 2 K ) 



1 _ A 2 e- X */ 2 



1 

Ak 



1 _ 4 e -(21og[^]+C 2 log[log[X]])/2 



V21og [K]+C7 2 log [log [K]] 
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if C 2 /2 > -1/2, i.e. C 2 > -1, and we take C 2 = 0. 



D: Risk Calculations 

Proof of Theorem 

We firstly note from [29] [p. 293] that the risk of regular hard thresholding is given by 



4 C) (A) 



(to+6»!) 2 >A 2 



w 2 (f){w) dw + e\ 



(w) dw 



(io+6»!) 2 <A 2 



1 + / [6\ - w 2 ] (f>(w) dw. 

'(w+6» 1 ) 2 <A 2 



We may then note that the risk for the hyperanalytic threshold with var ( n 
of takes the value 1 or a^ a,u \ is given by: 



(s,u)\ _ OJl 



n 



(T crf, for I = 0, 



(39) 
,L, where 



i f / [el-wDHa^J^] 

/ E i K+e,) 2 <A 2 Y ^ ' 



07 



(40) 



Proof of Corollary [7| 



1 } 2 



+ 0(A- 3 ) . (41) 



J«i 2 +«i 2 <A 2 27r V 2 / 

Furthermore the risk at 6 = can also be found for the other hyperanalytic thresholds. We note that for s = r and 
u = 1, 4, denoted by s = r\ in the Figures: 

o 2 



<°(A) = I- [ 



»'7 



2 <a 2 (V27t) 



:e -iM+2(»i+»3 2 )) d?JJl dw2 dw3 



a^ 1 (v^F) 3 



e 2-1 



ui 2 +ui 2 <A 2 — ui 2 



(42) 



1 + 7 (l/2, ^A 2 ) + 4A 2 (*( V2X) - $(A)) - 2 7 (l/2, A 2 ) = e"^ ^ + O (A~ 3 )^j , 



whilst for u = 2, 3 denoted by s = r 2 , 



< 2) (A) = l-/ y 
< } (A) = 



T,i w f<^ 2 (\/2^) 3 v / a( r '")( 1 - a (r ' n) ) 



e 2 



, — e 2 ^ ^ 



A /•27T 




L - / / / / r 2 cos 2 (fl) 

r-A 



e 2? " r 3 sin 2 (6*) sin(^) cfr cf6> # dA 



(43) 



4vr 2 



I - rV^dr = l-i /"^ 4 S V^ S = e"^ 2 (l + ^A 2 + ^A 4 



